Modeling ETL Data Quality Enforcement Tasks Using Relational Algebra Operators

نویسندگان

  • Vasco Santos
  • Orlando Belo
چکیده

Usually, a data warehouse is refreshed periodically with data gathered from disparate source systems. Nevertheless this data might not be fully accurate, probably containing serious data quality problems, such as uniqueness, misrepresentations, null values, multi-purpose fields, or inconsistent values, for one or more attributes. This is a major contribution to the falling expectations users have on data analyzed from data warehouses. Data quality enforcement is a complex time consuming task that parses data from source tables and corrects it, normalizes it and integrates it into a data warehouse for a better representation of real businesses. In this paper, we analyze some of the common tasks that are associated with data quality enforcement, representing and modeling them using Relational Algebra as specification tool. © 2013 Published by Elsevier Ltd. Selection and/or peer-review under responsibility of CENTERIS/HCIST.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending Relational Algebra to express one-to-many data transformations

Application scenarios such as legacy-data migration, ETL processes, data cleaning and data-integration require the transformation of input tuples into output tuples. Traditional approaches for implementing these data transformations enclose solutions as Persistent Stored Modules (PSM) executed by an RDBMS or transformation code using a commercial ETL tool. Neither of these solutions is easily m...

متن کامل

Extending the Relational Algebra with the Mapper Operator

Application scenarios such as legacy data migration, Extract-TransformLoad (ETL) processes, and data cleaning require the transformation of input tuples into output tuples. Traditional approaches for implementing these data transformations enclose solutions as Persistent Stored Modules (PSM) executed by an RDBMS or transformation code using a commercial ETL tool. Neither of these is easily main...

متن کامل

Hybrid: A Large-scale In-memory Image Analytics Engine

Analytical image/video processing tasks such as scene/face/activity recognition are historically performed outside most relational database management systems. Relational engines are optimized for relational data, hence naturally have weaker support for non-relational data such as images or video. Hybrid, a high-velocity in-memory analytics engine, supports advanced access capabilities to both ...

متن کامل

From SPARQL to MapReduce: The Journey Using a Nested TripleGroup Algebra

MapReduce-based data processing platforms offer a promising approach for cost-effective and Web-scale processing of Semantic Web data. However, one major challenge is that this computational paradigm leads to high I/O and communication costs when processing tasks with several join operations typical in SPARQL queries. The goal of this demonstration is to show how a system RAPID+, an extension o...

متن کامل

An ETL Metadata Model for Data Warehousing

Metadata is essential for understanding information stored in data warehouses. It helps increase levels of adoption and usage of data warehouse data by knowledge workers and decision makers. A metadata model is important to the implementation of a data warehouse; the lack of a metadata model can lead to quality concerns about the data warehouse. A highly successful data warehouse implementation...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013